AITopics | policy graph

Collaborating Authors

policy graph

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems

Koenigstein, Nicole

arXiv.org Machine LearningMay-28-2026

Multi-agent systems built on large language models (LLMs) require many coordination choices that are difficult to fix a priori: which skill protocol to invoke, which agent role should perform a subtask, which model to bind to each role, how roles should interact, when to use retrieval or verification, and when to omit a step entirely. These choices interact with task regime and operational constraints, so static pipelines and one-off model comparisons provide only a limited view of the design space. This paper introduces AgensFlow, an open-source framework that treats multi-agent coordination as an online policy-learning problem under partial observability. The framework makes coordination decisions observable and learnable from repeated trajectories, rather than treating skill, role, model, topology, and evaluation choices as fixed pipeline design. AgensFlow is evaluated on two corpora: distributed-systems incident tasks and security-advisory tasks. The evaluation shows three main results: learned routing reaches a higher-quality operating point than a fixed pipeline baseline on coordination-heavy classes; skip:X isolates topology compression as a meaningful part of the substrate; and warm-started policy graphs can reduce exploration cost while preserving plateau quality. Overall, the results support that learned, auditable routing can improve coordination-heavy multi-agent workflows over static wiring.

artificial intelligence, policy graph, signature, (15 more...)

arXiv.org Machine Learning

2605.27466

Genre: Workflow (0.88)

Industry:

Information Technology > Security & Privacy (0.34)
Education (0.34)
Health & Medicine (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

MDP modeling for multi-stage stochastic programs

Morton, David P., Dowson, Oscar, Pagnoncelli, Bernardo K.

arXiv.org Artificial IntelligenceSep-30-2025

We study a class of multi-stage stochastic programs, which incorporate modeling features from Markov decision processes (MDPs). This class includes structured MDPs with continuous state and action spaces. We extend policy graphs to include decision-dependent uncertainty for one-step transition probabilities as well as a limited form of statistical learning. We focus on the expressiveness of our modeling approach, illustrating ideas with a series of examples of increasing complexity. As a solution method, we develop new variants of stochastic dual dynamic programming, including approximations to handle non-convexities.

artificial intelligence, machine learning, policy graph, (15 more...)

arXiv.org Artificial Intelligence

2509.22981

Country:

Europe (0.67)
North America > United States > New Jersey (0.28)

Genre: Research Report (0.40)

Industry:

Government > Regional Government (0.67)
Energy > Power Industry (0.46)
Energy > Renewable (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

Levi, Elad, Kadar, Ilan

arXiv.org Artificial IntelligenceJan-19-2025

Large Language Models (LLMs) are transforming artificial intelligence, evolving into task-oriented systems capable of autonomous planning and execution. One of the primary applications of LLMs is conversational AI systems, which must navigate multi-turn dialogues, integrate domain-specific APIs, and adhere to strict policy constraints. However, evaluating these agents remains a significant challenge, as traditional methods fail to capture the complexity and variability of real-world interactions. We introduce IntellAgent, a scalable, open-source multi-agent framework designed to evaluate conversational AI systems comprehensively. IntellAgent automates the creation of diverse, synthetic benchmarks by combining policy-driven graph modeling, realistic event generation, and interactive user-agent simulations. This innovative approach provides fine-grained diagnostics, addressing the limitations of static and manually curated benchmarks with coarse-grained metrics. IntellAgent represents a paradigm shift in evaluating conversational AI. By simulating realistic, multi-policy scenarios across varying levels of complexity, IntellAgent captures the nuanced interplay of agent capabilities and policy constraints. Unlike traditional methods, it employs a graph-based policy model to represent relationships, likelihoods, and complexities of policy interactions, enabling highly detailed diagnostics. IntellAgent also identifies critical performance gaps, offering actionable insights for targeted optimization. Its modular, open-source design supports seamless integration of new domains, policies, and APIs, fostering reproducibility and community collaboration. Our findings demonstrate that IntellAgent serves as an effective framework for advancing conversational AI by addressing challenges in bridging research and deployment. The framework is available at https://github.com/plurai-ai/intellagent

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.11067

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Banking & Finance (0.48)
Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Intention-aware policy graphs: answering what, how, and why in opaque agents

Gimenez-Abalos, Victor, Alvarez-Napagao, Sergio, Tormos, Adrian, Cortés, Ulises, Vázquez-Salceda, Javier

arXiv.org Artificial IntelligenceSep-27-2024

However, precisely because of the definition of such a task, the result is an artefact that, unless explicitly designed to be transparent, is often not interpretable or, hence, trustworthy (Zhang et al., 2021; Lipton, 2017). This is where the field of Explainable Artificial Intelligence (XAI) shines through. A model explanation is an exercise in communication between a sender or source (i.e. the model or one of its components) and a receiver (i.e. the explainee, a human or another processor for a downstream task) that describes the relevant context or the causes surrounding some facts (Lewis, 1986; Miller, 2019; Wright, 2004), which in the context of AI is often related to its final or intermediary outputs or decisions. Any such communicative act can be considered an explanation, but not all explanations may be useful or even desirable. According to empirical studies (Slugoski et al., 1993), it can be argued that the form of an explanation must depend on its function as an answer to a question within a conversational framework. Furthermore, in the words of Herbert Paul Grice (Grice, 1975), for a communicative act to be useful, four maxims should be followed: 1. Manner: the message or explanans should be comprehensible and clear to the receiver, which within the context of XAI is often referred to as interpretability (Lipton, 2017), 2. Quality: the message contains truthful information; in the context of XAI, reliability or explanation verification (Zhou et al., 2021b; Slack et al., 2021; Arias-Duart et al., 2022), 3. Quantity: the length of a message should be just enough to be informative, often a heuristic implicitly agreed upon in the design of explainable systems which depends on both the sender and the code it uses, and 4. Relation: the explanation should be relevant to the given context, significant when one can keep searching for causes of causes beyond the scope of relevance.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.5281/zenodo.13862643

2409.19038

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Greater London > London (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(17 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

AnyTOD: A Programmable Task-Oriented Dialog System

Zhao, Jeffrey, Cao, Yuan, Gupta, Raghav, Lee, Harrison, Rastogi, Abhinav, Wang, Mingqiu, Soltau, Hagen, Shafran, Izhak, Wu, Yonghui

arXiv.org Artificial IntelligenceFeb-13-2023

We propose AnyTOD, an end-to-end, zero-shot task-oriented dialog (TOD) system capable of handling unseen tasks without task-specific training. We view TOD as a program executed by a language model (LM), where program logic and ontology is provided by a designer as a schema. To enable generalization to unseen schemas and programs without prior training, AnyTOD adopts a neuro-symbolic approach. A neural LM keeps track of events occurring during a conversation and a symbolic program implementing the dialog policy is executed to recommend next actions AnyTOD should take. This approach drastically reduces data annotation and model training requirements, addressing the enduring challenge of rapidly adapting a TOD system to unseen tasks and domains. We demonstrate state-of-the-art results on STAR, ABCD and SGD benchmarks. We also demonstrate strong zero-shot transfer ability in low-resource settings, such as zero-shot on MultiWOZ. In addition, we release STARv2, an updated version of the STAR dataset with richer annotations, for benchmarking zero-shot end-to-end TOD models.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.09939

Country:

North America > United States (0.14)
Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.64)

Add feedback

General Policies, Serializations, and Planning Width

Bonet, Blai, Geffner, Hector

arXiv.org Artificial IntelligenceDec-14-2020

It has been observed that in many of the benchmark planning domains, atomic goals can be reached with a simple polynomial exploration procedure, called IW, that runs in time exponential in the problem width. Such problems have indeed a bounded width: a width that does not grow with the number of problem variables and is often no greater than two. Yet, while the notion of width has become part of the state-of-the-art planning algorithms like BFWS, there is still no good explanation for why so many benchmark domains have bounded width. In this work, we address this question by relating bounded width and serialized width to ideas of generalized planning, where general policies aim to solve multiple instances of a planning problem all at once. We show that bounded width is a property of planning domains that admit optimal general policies in terms of features that are explicitly or implicitly represented in the domain encoding. The results are extended to much larger class of domains with bounded serialized width where the general policies do not have to be optimal. The study leads also to a new simple, meaningful, and expressive language for specifying domain serializations in the form of policy sketches which can be used for encoding domain control knowledge by hand or for learning it from traces. The use of sketches and the meaning of the theoretical results are all illustrated through a number of examples.

serialization, trajectory, transition, (15 more...)

arXiv.org Artificial Intelligence

2012.08033

Country:

Europe > France (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Sweden > Östergötland County > Linköping (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.92)

Add feedback

Technical Report: The Policy Graph Improvement Algorithm

Pajarinen, Joni

arXiv.org Artificial IntelligenceSep-4-2020

Optimizing a partially observable Markov decision process (POMDP) policy is challenging. The policy graph improvement (PGI) algorithm for POMDPs represents the policy as a fixed size policy graph and improves the policy monotonically. Due to the fixed policy size, computation time for each improvement iteration is known in advance. Moreover, the method allows for compact understandable policies. This report describes the technical details of the PGI [1] and particle based PGI [2] algorithms for POMDPs in a more accessible way than [1] or [2] allowing practitioners and students to understand and implement the algorithms.

artificial intelligence, machine learning, policy graph, (15 more...)

arXiv.org Artificial Intelligence

2009.02164

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > Finland (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Column Generation Algorithms for Constrained POMDPs

Walraven, Erwin, Spaan, Matthijs T. J.

Journal of Artificial Intelligence ResearchJul-17-2018

In several real-world domains it is required to plan ahead while there are finite resources available for executing the plan. The limited availability of resources imposes constraints on the plans that can be executed, which need to be taken into account while computing a plan. A Constrained Partially Observable Markov Decision Process (Constrained POMDP) can be used to model resource-constrained planning problems which include uncertainty and partial observability. Constrained POMDPs provide a framework for computing policies which maximize expected reward, while respecting constraints on a secondary objective such as cost or resource consumption. Column generation for linear programming can be used to obtain Constrained POMDP solutions. This method incrementally adds columns to a linear program, in which each column corresponds to a POMDP policy obtained by solving an unconstrained subproblem. Column generation requires solving a potentially large number of POMDPs, as well as exact evaluation of the resulting policies, which is computationally difficult. We propose a method to solve subproblems in a two-stage fashion using approximation algorithms. First, we use a tailored point-based POMDP algorithm to obtain an approximate subproblem solution. Next, we convert this approximate solution into a policy graph, which we can evaluate efficiently. The resulting algorithm is a new approximate method for Constrained POMDPs in single-agent settings, but also in settings in which multiple independent agents share a global constraint. Experiments based on several domains show that our method outperforms the current state of the art.

algorithm, artificial intelligence, machine learning, (12 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11216

AI Access Foundation

11216

Journal of Artificial Intelligence Research

Country: Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report > Promising Solution (0.45)

Industry: Energy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Solving DEC-POMDPs by Expectation Maximization of Value Function

Song, Zhao (Duke University) | Liao, Xuejun (Duke University) | Carin, Lawrence (Duke University)

AAAI ConferencesMar-16-2016

We present a new algorithm called PIEM to approximately solve for the policy of an infinite-horizon decentralized partially observable Markov decision process (DEC-POMDP). The algorithm uses expectation maximization (EM) only in the step of policy improvement, with policy evaluation achieved by solving the Bellman's equation in terms of finite state controllers (FSCs). This marks a key distinction of PIEM from the previous EM algorithm of (Kumar and Zilberstein, 2010), i.e., PIEM directly operates on a DEC-POMDP without transforming it into a mixture of dynamic Bayes nets. Thus, PIEM precisely maximizes the value function, avoiding complicated forward/backward message passing and the corresponding computational and memory cost. To overcome local optima, we follow (Pajarinen and Peltonen, 2011) to solve the DEC-POMDP for a finite length horizon and use the resulting policy graph to initialize the FSCs. We solve the finite-horizon problem using a modified point-based policy generation (PBPG) algorithm, in which a closed-form solution is provided which was previously found by linear programming in the original PBPG. Experimental results on benchmark problems show that the proposed algorithms compare favorably to state-of-the-art methods.

algorithm, artificial intelligence, machine learning, (16 more...)

AAAI Conferences

2016 AAAI Spring Symposium Series

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > North Carolina > Durham County > Durham (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.34)

Industry: Government > Regional Government > North America Government > United States Government (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Learning Finite-State Controllers for Partially Observable Environments

Meuleau, Nicolas, Peshkin, Leonid, Kim, Kee-Eung, Kaelbling, Leslie Pack

arXiv.org Artificial IntelligenceJan-23-2013

Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finite-state automata. In this paper, we extend Baird and Moore's VAPS algorithm to the problem of learning general finite-state automata. Because it performs stochastic gradient descent, this algorithm can be shown to converge to a locally optimal finite-state controller. We provide the details of the algorithm and then consider the question of under what conditions stochastic gradient descent will outperform exact gradient descent. We conclude with empirical results comparing the performance of stochastic and exact gradient descent, and showing the ability of our algorithm to extract the useful information contained in the sequence of past observations to compensate for the lack of observability at each time-step.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1301.6721

Country: North America > United States > Massachusetts > Middlesex County (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Government > Regional Government > North America Government > United States Government (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback